Improving OCR Accuracy for Classical Critical Editions
Identifieur interne : 000962 ( Main/Exploration ); précédent : 000961; suivant : 000963Improving OCR Accuracy for Classical Critical Editions
Auteurs : Federico Boschetti [États-Unis] ; Matteo Romanello [États-Unis] ; Alison Babeu [États-Unis] ; David Bamman [États-Unis] ; Gregory Crane [États-Unis]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 2009.
Abstract
Abstract: This paper describes a work-flow designed to populate a digital library of ancient Greek critical editions with highly accurate OCR scanned text. While the most recently available OCR engines are now able after suitable training to deal with the polytonic Greek fonts used in 19th and 20th century editions, further improvements can also be achieved with postprocessing. In particular, the progressive multiple alignment method applied to different OCR outputs based on the same images is discussed in this paper.
Url:
DOI: 10.1007/978-3-642-04346-8_17
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000008
- to stream Istex, to step Curation: 000008
- to stream Istex, to step Checkpoint: 000484
- to stream Main, to step Merge: 000970
- to stream Main, to step Curation: 000962
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Improving OCR Accuracy for Classical Critical Editions</title>
<author><name sortKey="Boschetti, Federico" sort="Boschetti, Federico" uniqKey="Boschetti F" first="Federico" last="Boschetti">Federico Boschetti</name>
</author>
<author><name sortKey="Romanello, Matteo" sort="Romanello, Matteo" uniqKey="Romanello M" first="Matteo" last="Romanello">Matteo Romanello</name>
</author>
<author><name sortKey="Babeu, Alison" sort="Babeu, Alison" uniqKey="Babeu A" first="Alison" last="Babeu">Alison Babeu</name>
</author>
<author><name sortKey="Bamman, David" sort="Bamman, David" uniqKey="Bamman D" first="David" last="Bamman">David Bamman</name>
</author>
<author><name sortKey="Crane, Gregory" sort="Crane, Gregory" uniqKey="Crane G" first="Gregory" last="Crane">Gregory Crane</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:E139A13B4800B4F0FC4DA869252849D648DB14FF</idno>
<date when="2009" year="2009">2009</date>
<idno type="doi">10.1007/978-3-642-04346-8_17</idno>
<idno type="url">https://api.istex.fr/document/E139A13B4800B4F0FC4DA869252849D648DB14FF/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000008</idno>
<idno type="wicri:Area/Istex/Curation">000008</idno>
<idno type="wicri:Area/Istex/Checkpoint">000484</idno>
<idno type="wicri:doubleKey">0302-9743:2009:Boschetti F:improving:ocr:accuracy</idno>
<idno type="wicri:Area/Main/Merge">000970</idno>
<idno type="wicri:Area/Main/Curation">000962</idno>
<idno type="wicri:Area/Main/Exploration">000962</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Improving OCR Accuracy for Classical Critical Editions</title>
<author><name sortKey="Boschetti, Federico" sort="Boschetti, Federico" uniqKey="Boschetti F" first="Federico" last="Boschetti">Federico Boschetti</name>
<affiliation wicri:level="2"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Perseus Digital Library, Tufts University, Eaton 124, 02155, Medford, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Romanello, Matteo" sort="Romanello, Matteo" uniqKey="Romanello M" first="Matteo" last="Romanello">Matteo Romanello</name>
<affiliation wicri:level="2"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Perseus Digital Library, Tufts University, Eaton 124, 02155, Medford, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Babeu, Alison" sort="Babeu, Alison" uniqKey="Babeu A" first="Alison" last="Babeu">Alison Babeu</name>
<affiliation wicri:level="2"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Perseus Digital Library, Tufts University, Eaton 124, 02155, Medford, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Bamman, David" sort="Bamman, David" uniqKey="Bamman D" first="David" last="Bamman">David Bamman</name>
<affiliation wicri:level="2"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Perseus Digital Library, Tufts University, Eaton 124, 02155, Medford, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Crane, Gregory" sort="Crane, Gregory" uniqKey="Crane G" first="Gregory" last="Crane">Gregory Crane</name>
<affiliation wicri:level="2"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Perseus Digital Library, Tufts University, Eaton 124, 02155, Medford, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2009</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">E139A13B4800B4F0FC4DA869252849D648DB14FF</idno>
<idno type="DOI">10.1007/978-3-642-04346-8_17</idno>
<idno type="ChapterID">17</idno>
<idno type="ChapterID">Chap17</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: This paper describes a work-flow designed to populate a digital library of ancient Greek critical editions with highly accurate OCR scanned text. While the most recently available OCR engines are now able after suitable training to deal with the polytonic Greek fonts used in 19th and 20th century editions, further improvements can also be achieved with postprocessing. In particular, the progressive multiple alignment method applied to different OCR outputs based on the same images is discussed in this paper.</div>
</front>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>Massachusetts</li>
</region>
</list>
<tree><country name="États-Unis"><region name="Massachusetts"><name sortKey="Boschetti, Federico" sort="Boschetti, Federico" uniqKey="Boschetti F" first="Federico" last="Boschetti">Federico Boschetti</name>
</region>
<name sortKey="Babeu, Alison" sort="Babeu, Alison" uniqKey="Babeu A" first="Alison" last="Babeu">Alison Babeu</name>
<name sortKey="Bamman, David" sort="Bamman, David" uniqKey="Bamman D" first="David" last="Bamman">David Bamman</name>
<name sortKey="Crane, Gregory" sort="Crane, Gregory" uniqKey="Crane G" first="Gregory" last="Crane">Gregory Crane</name>
<name sortKey="Romanello, Matteo" sort="Romanello, Matteo" uniqKey="Romanello M" first="Matteo" last="Romanello">Matteo Romanello</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000962 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000962 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:E139A13B4800B4F0FC4DA869252849D648DB14FF |texte= Improving OCR Accuracy for Classical Critical Editions }}
This area was generated with Dilib version V0.6.32. |